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Abstract.  Aalen’s  additive  risk  modei  allows  the  influence  of  covariates  on  a  hazard 
function  to  vary  over  time,  and  to  do  so  in  a  different  fashion  for  each  covariate.  Although 
allowing  greater  flexibility  than  a  Cox  model,  which  has  a  more  parsimonious  temporal  structure, 
the  number  of  covariates  that  can  be  handled  by  Aalen’s  model  is  quite  limited.  One  way  around 
this  difficulty  is  to  impose  some  a  priori  structure  on  the  form  of  the  model,  thereby  reducing  the 
number  of  functions  to  be  estimated.  In  this  paper  we  introduce  a  partly  parametric  version  of 
Aalen’s  model  in  which  only  a  small  number  of  the  covariates  are  selected  to  have  their  influence 
vary  nonparametrically  over  time,  and  the  influence  of  the  remaining  covariates  is  restricted  to  be 
constant  in  time.  Efficient  procedures  for  fitting  this  new  model  are  developed  and  studied.  The 
approach  is  applied  to  data  from  the  British  Medical  Research  Council’s  myelomatosis  trials. 
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1  Introduction 

Aalen  (1980)  proposed  the  following  additive  model  for  the  intensity  of  a  count¬ 
ing  process: 

A(t|z)  =  Or(<)'z, 

where  z  is  a  p-vector  of  covariates  and  o  is  a  p-vector  of  unknown  functions  of 
time.  The  first  component  of  z  may  be  set  to  1  to  allow  for  a  baseline  hazard. 
More  recently  estimation  in  this  model  has  been  studied  by  Huffer  and  McKeague 
(1991)  and  McKeague  (1988a,  1988b).  Greenwood  and  Wefelmeyer  (1990,  1991)  and 
Sasieni  (1992)  have  shown  that  the  Huffer-McKeague  estimator  is  asymptotically 
efficient  and  that  it  is  an  approximate  maximum  likelihood  type  estimator.  The 
model  has  had  only  limited  use  in  data  analysis  and  primarily  in  data  sets  with 
just  a  few  covariates.  Examples  may  be  found  in  Aalen  (1989),  Mau  (1986,  1988) 
and  Henderson  and  Milner  (1991).  One  reason  for  the  lack  of  use  is  that  a  sepeurate 
nonpareimetric  function  must  be  estimated  in  association  with  each  covariate.  As 
a  way  of  redressing  this  problem  we  here  introduce  a  partly  parametric  version 
of  the  additive  risk  model  in  which  the  effects  of  some  covariates  are  assumed  to 
be  constant  in  time.  This  restriction  could  be  relaxed  at  some  price  by  allowing 
a  parametric  model  for  a  component  Ot(-)-  Alternatively,  by  defining  a  new  time 
dependent  covariate,  z*{t)  =  zexp(— <)  say,  non-constant  time  effects  could  be  fit 
with  this  model  directly. 

Assume  that  the  intensity  at  t  for  an  individual  with  covariates  x  and  z  is  given 
by 

X(t\x,z)  =  a{tyx  +  0'z,  (1.1) 

with  the  covariates  being  q  and  p  dimensional  respectively.  We  are  interested  in 
estimating  ^  amd  the  vector  of  ‘cumulative  hazards’  A(-)  =  JJj  a(s)ds.  If  0  were 
known  then  one  could  use  Aalen’s  least  squares  estimator  for  A(-)  followed  by  the 
Huffer-McKeague  scheme  to  obtain  an  efficient  estimator.  Similarly,  if  «(•)  were 
known  then  one  could  estimate  ^  by  maximum  likelihood.  However,  neither  A  nor 
^  are  known  and  it  is  not  obvious  how  to  construct  efficient  estimates  although, 
intuitively,  iterating  between  estimation  of  ^  and  a  should  work.  The  approach 
used  here  is  to  look  at  the  efficient  score  equation  for  aind  to  use  this  to  obtain  a 
set  of  pseudo-normal  equations.  There  are  similarities  with  the  approach  used  by 
Sasieni  (1992)  to  motivate  the  Huffer-McKeague  estimator. 
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Various  authors,  most  recently  Lin  and  Ying  (1992),  have  considered  the  fol¬ 
lowing  additive  analogue  of  Cox’s  (1972)  proportional  hcizaxds  model: 

\{t\z)  =  aQ{t)  +  P'z.  (1.2) 

This  is  a  special  case  of  the  partly  parametric  additive  risk  model  (1.1)  obtained 
by  treating  the  baseline  hazard  function  nonparametrically  and  all  the  covaxiates 
parametrically.  The  temporal  influence  of  each  covariates  is  required  to  be  constant, 
so  it  is  considerably  less  versatile  than  (1.1).  However,  this  model,  like  the  Cox 
model,  can  be  useful  in  the  initial  exploratory  stages  when  there  are  a  very  large 
number  of  covariates.  In  an  application  to  real  data  (Section  4)  we  used  (1.2)  to 
And  the  two  most  influential  covariates,  followed  by  (1.1)  with  one  of  the  covairiates 
treated  nonparametrically.  This  led  to  a  much  better  fit  than  could  be  achieved  by 
either  (1.2)  or  the  Cox  model. 

The  paper  is  organised  as  follows.  In  Section  2  we  derive  our  efficient  estima¬ 
tors  for  /3  and  A  and  discuss  their  practical  implementation.  In  Section  3  we  discuss 
ways  to  reflne  a  partly  additive  risk  model  and  a  diagnostic  technique  which  can 
give  useful  information  about  how  much  the  estimates  would  change  if  an  individual 
observation  were  removed  from  the  data  set.  An  application  to  finding  prognostic 
factors  for  survival  among  myelomatosis  patients  is  discussed  in  Section  4.  Some 
general  discussion,  comparing  the  new  approach  with  the  standeird  Cox  model  ap¬ 
proach  to  regression  analysis  of  censored  survival  data,  is  provided  in  Section  5. 
The  asymptotic  distributions  of  the  estimators  are  obtained  in  SectioiTb. 

2  Semiparametric  estimators 

2.1  Notation  and  derivation  of  the  estimators 

Denote  by  (ij,  Zj,  Ti,  ^,)  the  observed  covariates  x,  and  z,,  possibly  censored 
failure  time  Tj,  and  censoring  indicator  6i,  for  the  ith  of  n  independent  and  identi- 
ceJly  distributed  individuals;  Si  =  1  if  Ti  is  uncensored.  In  the  usual  survi\'al  set-up 
with  non-informative  and  conditionally  independent  censoring  the  log-likelihood  for 
A  is 

^  l^ilogAi(Ti)  -  y  l[t<rj]Ai(<)(it|  , 

where  the  range  of  integration  extends  over  the  period  of  foilow-up  and  \i{t)  — 
A(t|xi,Zi).  Let  Ni{t)  —  l[r,<t,fi,=i]  he  the  corresponding  counting  process  and 
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Mi{t)  =  Ni(t)— fg  ds  the  associated  martingale.  Assume  that  A(-|x,  z)  is 

boimded  away  from  zero.  Consider  a  one-dimensional  parametric  submodel,  a(t)  = 
a(t;  rj),  in  which 


da(t) 


Differentiating  the  log  likelihood  with  respect  to  ^  and  rj  we  obtain  the  parametric 
score  function: 


Setting  1^  =  0  yields 


=  (/  Z'WZ  dt)-\J  Z'WdN  ~  J  Z'WX  dA),  (2.1) 


where  Z  =  Z(t)  =  ^  n-matrix  W  is  defined  by 

W{t)  =  diag{l/Ai(i)},  N{t)  =  {Ni{t),...,N„{t)y  and  X  is  defined  like  Z.  Next, 


ir,  =  iab  =  fb{iyX'WdM(t) 

=  /  b'X'WdN  -  /  b'X'WZp  dt-J  b'X'WX  dA, 

where  M  =  (Mi, . . . , M„)'. 

Setting  lab  =  0  for  a  sufficiently  large  collection  of  vector- valued  functions  b 
implies  that 

A{i)=  f  {X'WX)-\X'WdN -X'WZ^ds).  (2.2) 

Jo 

Substituting  the  right  hand  side  of  (2.2)  into  (2.1)  and  solving  for  gives 

=  (/  Z'HZ  dtf^  /  Z'H  dN,  (2.3) 

where  H  =  W  —  W X{X'W X)~^ X'W .  Now  ^  is  not  an  estimator  since  it  depends 
on  A,  which  is  unknown.  However  /?  is  the  solution  of  =  0,  where  is  the  efficient 
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score  for  as  shown  in  the  Appendix.  This  implies  that  an  estimator  based  on 
(2.3),  but  with  a  consistent  estimate  of  A  replacing  the  unknown  function,  will 
be  eificient  for  the  semiparametric  model;  see  Bickel,  Klasssen,  Ritov  and  Wellner 
(1992,  Theorem  3.4.1). 

The  use  of  the  identity  matrix  I  in  place  of  W  gives  an  initial  estimate  of 
To  construct  an  efficient  estimator  we  propose  two  methods.  These  both  replace  W 
by  PF  =  diag{l/A,(-)}  where  A,-  is  some  estimate  of  A,-.  The  second  method  is  more 
appropriate  when  the  dimension  of  /3  is  large. 

Method  I: 

(i)  Fit  the  full  Aalen  model,  A(t|a;,2)  =  q(<)'i  +  0{ty  z,  and  obtain  an  estimate  W 
of  W  from  a  historical  kernel  smoother,  as  in  Buffer  and  McKeague  (1991). 

(ii)  Find  an  estimate  of  ^  by  (2.3),  using  W  in  place  of  W . 

(iii)  Estimate  A  from  (2.2)  using  W  and  ^  in  place  of  W  and 

Method  II: 

(i)  Estimate  ^  inefficiently  from  (2.3)  using  /  in  place  of  W. 

(ii)  Estimate  A  inefficiently  from  (2.2)  using  I  in  place  of  W  zind  the  estimate  of 
^  from  (i),  and  then  use  historical  kernel  smoothing  to  estimate  a. 

(iii)  Obtain  an  estimate  W  of  W  using  the  estimates  of  ^  and  a  from  (i)  and  (ii). 

(iv)  Obtain  final  estimates  A  and  ^  using  (2.2)  and  (2.3)  with  W  in  place  of  W . 

In  our  computer  implementation  we  have  used  method  II  with  a  A,-  that  is 
explicitly  defined  in  subsection  2,4.  Notice  that  Z  and  X  are  functions  of  t  and 
that,  provided  the  covariates  are  predictable,  the  same  estimating  equations  (2.2) 
smd  (2.3)  could  be  used  with  time-dependent  covariates. 

The  gain  in  efficiency  of  the  two-step  estimator  using  W  compared  to  the  initial 
estimator  using  I  will  depend  on  the  heterogeneity  of  the  hazeird  of  individuals 
in  the  sample.  For  instance,  if  all  individuals  are  at  equal  risk,  so  that  none  of 
the  covariates  are  related  to  survival,  then  there  is  no  efficiency  gain.  In  general, 
however,  there  will  be  a  small  gain.  Buffer  and  McKeague  (1991)  investigated  by 
simulation  the  asymptotic  relative  efficiency  of  the  OLS  estimator  in  the  Aalen 
model  and  foimd  it  to  be  between  72%  and  98%  depending  on  the  distribution  of 
the  covariates  and.the  magnitude  of  the  risk  associated  with  them.  The  situation  is 
somewhat  more  complicated  here  because  our  estimate  of  ^  depends  on  the  weights 
for  all  individuals  at  risk  at  each  failure  time.  In  any  given  situation  one  can  however 
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fairly  easily  examine  the  efficiency  gain  by  comparing  the  asymptotic  covariance 
matrices  or  more  thoroughly  via  a  bootstrap  simulation. 

2.2  Estimating  the  asymptotic  covariance  matrix 

The  asymptotic  distribution  of  $  and  A  is  the  same  for  methods  I  and  II.  Stan¬ 
dard  coxmting  process  techniques  (Section  6)  can  be  used  to  show  that  —  /?) 

converges  in  distribution  to  a  p-variate  normal  with  mean  zero  and  with  covariance 
matrix  which  can  be  consistently  estimated  by  where  S  =  n~^  J  Z' HZ  dt. 
Here  H  is  the  estimated  version  of  H  obtained  by  replacing  IF  by  a  consistent  esti¬ 
mate  W.  The  same  approach  shows  that  —  A)  converges  in  distribution  to 

a  g-vziriate  Gaussiein  process  with  mean  zero  and  with  a  covariance  function  which, 
as  a  function  of  s  and  t,  can  be  consistently  estimated  by 

n  Y,  (2.4) 

u<aA< 

where  A„  is  the  jump  in  A  at  time  u  and 

/  {X'WXy^X'WZds. 

Jo 

The  first  term  in  (2.4)  is  a  consistent  estimate  of  the  covariance  function  for  the 
model  in  which  only  the  nonparametric  terms  are  non-zero;  the  second  term  repre¬ 
sents  the  contribution  from  the  parametric  part  of  the  model. 

2.3  Grouped  data  version 

Although  evaluation  of  the  estimates  is  fast  on  even  a  small  computer,  one  may 
wish  to  fit  a  grouped  data  version  of  the  model  in  the  exploratory  stage  of  model 
building.  For  most  purposes  grouping  the  time  axis  into  about  ten  intervals  will  be 
adequate  and  this  will  greatly  reduce  the  computation.  The  grouped  data  model 
may  be  written 

K 

A(t|a:,r)  =  ^  a(-)ili..(<)  + /?'z, 

1=1 

where  the  time  axis  has  been  divided  into  K  intervals  Ji , . . . ,  Ik  with  =  [Ti_i ,  t^) 
and  To  =  0.  One  approach  to  estimation  treats  this  as  a  parametric  linear  model 
with  Kq  -f  p  parameters,  but  even  for  moderately  large  K  it  makes  sense  to  take 
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into  account  the  orthogonality  of  the  dummy  covariate  blocks  i  =  1, .  . ,  K. 

Let  k(u)  denote  the  index  such  that  u  €  ^k(u)-  Proceeding  as  before  one  has 


^  «(t)7’|-  +  Ofc(u)(“  -  T{k(n)-1]) 

1=1 


and 


i)  =  [J  X'WX  dt 


r'h- 


WdN  -  X'WZ^du). 


Thus  instead  of  having  to  solve  a  system  of  (p  +  q)  linear  equations  at  each  failure 
time,  one  has  only  to  solve  such  a  system  for  each  time  interval. 


2.4  The  choice  of  weights 

Although  asymptotically  one  may  use  any  consistent  obtained  via  method 
I  or  II  to  estimate  the  efficient  weights,  in  practice  the  choice  of  A,-  needs  to  be 
made  with  some  care.  It  is  a  good  idea  to  compare  the  weighted  estimates  with 
the  unweighted  ones,  since  both  are  consistent  on  the  model,  and  this  can  provide 
a  check  of  whether  the  weights  axe  wildly  off-target. 

The  implementation  that  we  have  used  calculates  A,  as  follows.  Let  T(,)  denote 
the  ith  ordered  failure  time,  and  set  T(o)  =  0.  Given  initial  estimates  A  and  /?,  we 
estimate  a(t)  for  t  >  T(^a)  by 


«(<)  = - - 

i(.)  -  J-d-d) 


when  r(,)  <  t  <  T(,+i). 


We  have  found  that  taking  d  between  and  works  well  for  n  between  100 
and  1000.  Notice  that  Ai(f)  =  d:(<)'ii  -f  $'zi  estimates  A,(t),  but  it  cannot  always 
be  used  to  estimate  the  weights  since  it  may  be  non-positive  and  it  is  undefined  for 
Instead,  we  use 


w/'i  _  /  max(eA(<),A,(f))  for  t  >  T[a) 
~  \  X{t)  for  t  <  r(rf) 


where  A(t)  is  the  average  of  Xi{t  V  over  all  individuals  i  at  risk  at  time  t—. 

We  recommend  taking  e  between  0.15  and  0.35  in  defining  the  ‘minimum  edlowable 
heizard’  eA(t).  The  examples  in  this  paper  are  based  on  d  =  50  and  e  =  0.25.  Strictly 
speaking.  A;  departs  from  methods  I  aind  II  for  t  <  T(dd  but  this  will  have  negligible 
effect  provided  that  T(d)  is  smeill  compared  to  the  total  length  of  follow-up. 
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Many  variations  on  this  recipe  for  Aj  are  possible.  For  instzmce,  the  ‘bandwidth’ 
for  estimating  a  could  be  taken  to  be  a  fixed  length  of  time  or  a  fixed  number  of 
uncensored  failure  times. 

3  Model  refinement  and  diagnostics 

The  most  immediate  problem  faced  when  applying  the  partly  parametric  ad¬ 
ditive  risk  model  is  in  determining  which  of  the  covariates  should  be  modeled  non- 
paJ•^lmetrically.  There  may  be  scientific  reasons  for  wanting  to  include  a  particular 
variable  nonpajametrically.  It  is  possible  that  some  factor  will  not  be  significant 
when  modeled  parametrically  even  though  it  has  a  strong  effect  on  survival,  e.g.,  a 
drug  that  is  strongly  toxic,  but  which  helps  those  patients  who  survive  the  initial 
period  of  toxicity.  However,  if  there  are  a  large  number  of  covariates,  then  it  is 
advisable  to  start  by  treating  at  most  a  few  of  them  nonparametrically  and  the  rest 
parametrically.  It  is  generally  sensible  to  include  a  nonparametric  baseline.  The 
covariates  having  the  most  insignificant  effects  should  then  be  dropped  from  the 
model  one-by-one.  The  next  step  would  be  to  examine  whether  the  influence  of 
each  of  the  pzurametric  covariates  varies  with  time  by  treating  each  nonparamet¬ 
rically  and  looking  at  the  plot  of  Aj{t)  along  with  the  corresponding  straight  line 
estimate  t0j.  These  plots  together  with  pointwise  confidence  intervals  against  time 
will  give  some  indication  of  the  validity  of  the  parzunetric  assumptions  and  how 
they  are  violated  when  they  fail.  This  approach  is  illustrated  by  the  example  in 
the  following  section.  Other  approaches  are  possible.  For  instance,  one  might  fit  a 
separate  Aalen  model  for  each  covariate,  to  get  an  initial  idea  of  the  variation  of 
the  additive  hazard  with  time,  before  attempting  multivariate  modelling.  Alterna¬ 
tively,  after  selecting  a  pzirtly  parametric  model,  one  might  check  to  see  if  any  of 
the  variables  not  included  make  a  significant  nonpzirametric  contribution. 

Influence  residuals  can  give  useful  information  about  how  much  the  estimates 
would  change  if  an  individual  observation  were  removed  from  the  data  set.  For 
fixed  kF,  /?  as  defined  in  (2.3)  is  an  explicitly  defined  functional  of  the  empirical 
distribution  function.  Differentiating  this  functional  and  evaluating  the  derivative 
at  the  empirical  distribution  gives  the  empirical  influence  curve  (Cook  and  Weisberg, 
1982,  pp.  104-108).  Straightforward  differentiation  and  a  little  algebra  yields 

d0,  =  ifZ'HZdt)-^  f{zi  -  Z'WX{X'WX)-'^Xi]WudM, 
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as  the  influence  of  the  ith  individual  on  /3.  Here  Mi(t)  =  Ni{t)  —  l[7’,>aj(xj  dA  + 

$'zids)  is  the  martingale  residual  for  observation  i.  The  effect  of  estimating  W  on 
the  influence  curve  for  ^  is  asymptotically  negligible  whenever  the  assumed  model 
holds.  That  is,  if  the  influence  curve  is  evaluated  at  a  probability  measure  for  a 
partly  additive  Aalen  model  with  covariates  x  amd  z  then  the  above  expression  for 
the  O^i’s  will  be  correct  to  first  order. 

As  in  proportional  hazards  regression,  the  basic  diagnostic  building  block  is  the 
counting  process  martingale  residual  Mi(-)  (Barlow  and  Prentice,  1988;  Therneau, 
Grambsch  and  Fleming,  1990).  A  plot  of  the  against  t  can  be  used  to 

check  for  systematic  lack  of  fit  due  to  components  not  being  allowed  to  rary  freely 
in  time,  as  in  (1.2).  To  investigate  the  role  of  individual  cov^ariates  one  may  partition 
the  time  axis  into  about  ten  mtervals  [rj_i,  r^)  [j  =  1, . . . ,  J)  and  for  each  j ,  plot 
the  increments  Mi(Tj)  —  against  covariates  for  individuals  at  risk  at  rj_i. 

Such  plots  are  analogous  to  partial  residual  plots  in  the  linear  model  and  may  detect 
the  need  to  transform  a  covariate.  One  can  also  check  whether  the  additive  risk 
associated  with  a  given  covariate  varies  in  time  by  comparing  the  J  plots  for  that 
covariate,  after  rescaling  each  by  tj  —  Tj-\ . 

4  Example 

In  this  section  we  discuss  the  fitting  of  a  partly  parametric  additive  risk  model 
to  data  from  the  British  Medical  Research  Council’s  (1984)  fourth  myelomatosis 
trial.  We  analyzed  survival  data  on  495  myelomatosis  patients  for  whom  presenta¬ 
tion  measurements  included  serum  02  microglobulin  and  haemoglobin.  Percentiles 
of  these  measurements  are  given  in  Table  1.  In  fitting  the  regression  models,  serum 
02  microglobulin  was  transformed  by  logjo(-)  to  compensate  for  its  skewness. 

Several  studies  (e.g.  Cuzick,  Cooper  and  MacLennan,  1985)  have  indicated 
that  serum  02  microglobulin  is  of  primary  importance  in  predicting  survival  in 
myelomatosis  patients.  However,  a  recent  paper  of  Cuzick,  De  Stavola,  Cooper, 
Chapman  and  MacLennan  (1990)  suggests  that  its  value  is  confined  to  the  first  two 
years  of  follow-up.  This  claim  was  b2ised  on  an  analysis  using  separate  proportional 
hazards  models  for  different  follow-up  intervals.  Such  an  approach  has  limited 
ability  to  model  covariate  effects  that  vary  in  their  influence  over  time.  We  think 
that  it  is  more  appropriate  to  apply  a  partly  parametric  additive  risk  model  when 
searching  for  such  variations. 
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Table  1.  Percentiles  of  serum  ^2  microglobulin  and  haemoglobin. 


Covariate 

min 

10 

25 

50 

75 

90 

max 

serum  02 

0.3 

2.3 

3.3 

5.7 

9 

22 

76.7 

haemoglobin 

25 

71 

90 

106  122 

136 

167 

We  initially  treated  the  covariates  parametrically  and  the  baseline  nonpara- 
metricedly,  as  in  model  (1.2).  The  Wald  statistics  for  testing  whether  the  corre¬ 
sponding  parameters  are  zero  were  2.25  for  serum  ^2  microglobulin  and  —3.24  for 
haemoglobin.  Thus  there  is  strong  evidence  that  both  covariates  are  influential. 

Next  we  considered  the  model  with  haemoglobin  treated  parametrically,  and 
the  baseline  hazard  and  serum  ^2  microglobulin  treated  nonparametrically.  This 
turned  out  to  be  our  final  model.  Figures  1  and  2  show  plots  of  the  cumulative  risks 
for  the  two  nonparametric  terms;  Figure  2  also  contains  the  straight  line  estimate  of 
the  cumulative  risk  for  serum  ^2  microglobulin  based  on  model  (1.2).  Note  that  in 
the  first  three  years  the  straight  line  falls  outside  the  95%  confidence  limits,  strongly 
suggesting  that  the  influence  of  serum  02  microglobulin  varies  with  time.  Further 
inspection  of  Figure  2  indicates  a  plateau  in  the  nonparametric  cumulative  risk 
estimate  after  about  two  yeeirs,  which  is  consistent  with  the  claim  of  Cuzick  et  al. 
(1990)  that  serum  02  microglobulin  is  of  primary  importance  in  predicting  survival 
only  within  the  first  two  years  of  follow-up.  Haemoglobin  treated  parametrically 
has  significant  influence  (the  Wald  statistic  is  —3.13)  and  from  Figure  3  we  see  that 
its  influence  does  not  vary  appreciably  with  time  since  the  straight  line  estimate  of 
cumulative  risk  is  almost  completely  contained  in  the  95%  confidence  limits  around 
the  nonparametric  estimate  based  on  the  full  Aalen  model. 

It  is  noticeable  from  Figure  2  that  the  confidence  intervals  inevitably  become 
wider  with  time.  Suppose  one  looked  at  survival  beyond  2  years:  i.e.,  A{t)  —  A(2) 
for  t  >  2.  In  that  way  the  intervals  would  have  zero  width  at  2  years  and  would  be 
n2irrower  at  5  years.  There  would  be  a  second  set  of  bands,  identical  to  the  ones 
in  Figure  2,  for  0  to  2  years.  It  would  seem  sensible  that  the  two  sets  of  inter\’als 
should  be  made  wider  to  allow  for  the  implicit  multiple  testing  that  is  taking  place: 
(i)  A(t)  =  tb  for  0  <  t  <  2  and  (ii)  A{t)  —  A{2)  —  —  2)b  for  2  <  /  <  6. 
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For  the  purpose  of  predicting  survival  based  on  our  final  model,  one  can  use 
the  estimate  ^ 

5'(t|i,  z)  =  exp  I  —  i:  (x'  dA  +  z' P  ds)| 

of  the  survival  function  5(t|i,2)  =  pr(r  >  t|x,2)  at  given  values  of  the  covariates. 
In  Figure  4  we  have  plotted  the  average  predicted  survival  probabilities  for  groups 
defined  in  terms  of  the  lower/upper  quartiles  of  the  covariates.  Note  that  patients 
with  low  haemoglobin  and  high  serum  ^2  microglobulin  are  at  the  highest  risk, 
whereas  patients  with  high  haemoglobin  and  low  serum  ^2  microglobulin  are  at  the 
lowest  risk. 

A  less  than  pleasing  feature  of  the  curves  in  Figure  4  is  that  they  are  not 
monotone.  Since  each  curve  is  an  averaged  estimates  of  survival  functions  it  would 
be  sensible  to  taJce  a  monotone  version  as  the  final  curve.  This  could  easily  be 
achieved  by  isotonically  regressing  S{t\x^z)  against  t.  We  have  not  chosen  to  do 
that  here  in  order  to  show  that  the  lack  of  monotonicity  is  only  very  slight.  Indeed 
an  estimated  survival  curve  with  significantly  increasing  sections  would  indicate  a 
lack  of  fit,  since  it  is  known  that  on  the  model  the  estimate  is  consistent  for  the 
true  survival  function. 

As  a  rough  check  of  goodness-of-fit  it  is  useful  to  compare  the  various  model 
based  estimates  of  survival  probability  with  the  local  Kaplan-Meier  estimate.  In 
Figure  5  we  plot  the  local  Kaplan-Meier  estimate  for  the  high  heiemoglobin  and  low 
serum  ^2  microglobulin  group  (Hb  >  122,  serum  ^2  <  3.3)  and  compare  it  with  the 
average  survival  probabilities  predicted  by  the  different  models.  It  appeeirs  that  our 
model  offers  a  much  better  fit  than  either  the  Cox  model  or  the  Lin-Ying  model. 

In  this  example  the  relative  efficiencies  of  our  estimators  compeired  to  the  OLS 
estimators  were  114%  and  120%  for  the  baseline  and  serum  /?2  cumulative  hazard 
functions  at  four  years,  and  112%  for  the  parameter  corresponding  to  haemoglobin. 

[Insert  Figures  1-5  about  here] 

Figure  1.  Estimate  of  the  bziseline  cumulative  risk  ( - )  with  95%  confidence 

limits  ( - )  based  on  the  final  model. 

Figure  2.  Estimate  of  the  cumulative  risk  for  serum  02  ( - )  with  correspond¬ 
ing  95%  confidence  limits  ( - )  based  on  the  final  model.  The  straight  line 
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estimate  is  obtained  from  the  model  in  which  serum  3^  well  as  Hb,  axe  treated 
parametrically  . 

Figure  3.  Estimate  of  the  cumulative  risk  for  Hb  ( - )  with  corresponding 

95%  confidence  limits  ( - )  based  on  the  “full”  Aalen  model.  The  straight  line 

estimate  is  obtained  from  the  model  with  Hb  treated  parametrically  and  serum  02 
and  the  baseline  treated  nonpaxametrically. 

Figure  4.  Average  predicted  survival  probabilities  according  to  risk  group. 

Figure  5.  Local  Kaplan-Meier  estimate  of  survival  probability  for  the  high  Hb/low 
serum  02  group,  compared  with  various  model-based  estimates  averaged  over  this 
group. 

We  have  also  applied  our  approach  to  data  on  559  patients  from  the  British 
Medical  Research  Council’s  fifth  myelomatosis  trial.  In  this  case  we  used  indi¬ 
cators  for  treatment,  sex  and  four  age  strata  as  covariates  entering  parametri¬ 
cally.  The  treatment  was  a  trial  drug  regimen  that  was  compared  to  conventional 
chemotherapy.  The  baseline  and  serum  02  were  handled  nonparametrically  as  be¬ 
fore.  Haemoglobin  was  tried  parametrically,  but  did  not  turn  out  to  be  a  significant 
covariate,  so  was  dropped  from  the  model.  The  shape  of  the  serum  02  cumulative 
hazzu'd  curve,  given  in  Figme  6,  is  remarkably  simileu:  to  that  in  the  fourth  trial 
(Figure  2):  the  straight  line  estimate  again  falls  outside  the  95%  confidence  inter¬ 
vals  and  there  appears  to  be  a  plateau  after  about  2.5  years.  The  curve  is  plotted 
only  up  to  3.5  years,  which  is  as  far  as  we  can  go  with  data  from  the  fifth  trial. 
Figure  7  justifies  our  handling  of  the  treatment  parametrically:  the  treatment  effect 
appears  to  be  constant  in  time  since  the  straight  line  falls  within  the  95%  confidence 
intervals  for  the  nonparametric  cumulative  hazard. 

[Insert  Figures  6  aind  7  about  here] 

Figure  6.  Fifth  myelomatosis  trial  based  estimates  of  the  cumulative  risk  for  serum 
02‘,  compare  with  Figure  1. 

Figure  7.  Fifth  myelomatosis  trial  based  estimates  of  the  cumulative  risk  for  the 
treatment  effect. 


12 


The  Wald  statistic  for  testing  for  a  treatment  effect  was  —2.99,  suggesting 
that  patients  receiving  the  drug  regimen  had  significantly  better  survival.  At  two 
years,  the  predicted  effect  of  treatment  is  to  increase  the  probability  of  survi\'al 
by  approximately  30%  over  what  it  would  be  for  an  individual  on  conventional 
chemotherapy. 

5  Discussion 

The  standard  method  for  regression  analysis  of  survival  data  at  present  is 
the  proportional  hazards  model  with  exponential  link  function  (Cox,  1972).  Some 
compeirisons  between  this  and  the  present  model  seem  in  order. 

Consider  first  the  simplest  case  of  a  single  binary  covariate  representing  two 
samples.  The  nonparametric  additive  model  permits  nonparametric  estimation  of 
the  survival  function  in  each  sample  separately.  The  Cox  model  permits  a  single 
nonparametric  baseline  hazard  function  and  assumes  that  the  hazard,  at  any  time 
t,  in  one  sample  is  always  a  common  multiple  of  the  hazard,  at  the  same  time  t,  in 
the  other  sample.  An  additive  model  with  a  nonparametric  baseline  and  parametric 
covariate  effect  is  similzu:  to  the  Cox  model,  except  that  the  difference  between  the 
two  hazard  functions  is  constant  over  time  (Table  2). 

Table  2.  Model  assumptions  for  the  two  sample  problem. 


Aalen:  Ai,  A2  unspecified 

Cox;  A2(<)  =  ^Ai(<),  Ai  unspecified 

New:  A2(t)  =  Ai(<)  +  6,  Ai  unspecified 


More  generally,  comparison  between  the  Cox  eind  the  peirtly  parametric  additive 
model  is  simplest  when  the  only  nonparametric  component  is  the  baseline.  In 
that  case  the  Cox  model  has  A(t|^)  =  Xo{t)exp{ffz)  and  the  additive  model  has 
X(t\z)  =  Ao(t)  +  z  .  Such  models  have  been  considered  before  (e.g.,  Cox  and 
Oakes,  1984,  p.74). 

The  full  flexibility  of  the  semiparametric  additive  model  is  seen  by  comparing 
it  to  the  stratified  Cox  model.  Given  two  one-dimensional  of  covariates,  x  and  z, 
one  may  describe  the  stratified  Cox  model  by 

X{t\x,z)  =  X{t\x)  exp{l3' z) 
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A(<|x  =  k)  =  Xk(t)  assuming  a:  €  {1,2, . .  .,K} 
and  the  additive  model  by 

A(<|i,z)  =  A(t|x)  +  ^'z 
A(<|x  =  fc)  =  a(t)'x 

=  Ait(t),  the  fcth  component  of  a(t) 

when  X  is  a  vector  of  K  dummy  variables. 

It  is  possible  to  generalise  the  Cox  model  so  that  it  is  directly  comparable  to 
our  partly  parametric  Aalen  model.  Consider  the  Cox-type  model 

A(t|x,z)  =  Ao(t)exp(ci;(<yx  + 

with  time- dependent  coefficients  a(t)  and  time-independent  coefficients  /?.  This  is  a 
partly  parametric  version  of  a  model  studied  by  Zucker  and  Karr  (1990).  Note  that 
exp{a(/)'(x2  —  xi)}  can  be  interpreted  as  the  time-specific  relative  risk  between  an 
individual  with  covariates  (x2,z)  and  one  with  (xj  ,2).  The  unknown  a  and  /?  can 
be  estimated  via  a  histogram  sieve  approach:  treat  a  as  a  step  function,  constant 
on  each  of  K  intervals  Ij  that  partition  the  follow-up  period,  cf.  the  grouped  data 
version  of  our  model.  This  gives  a  standard  Cox  model  problem  with  Kq  +  p 
covariates  defined  by  the  Kq  components  of  the  xljj  and  the  p  components  of  z. 
Asymptotic  theory,  with  K  as  well  as  n  tending  to  infinity,  for  the  resulting  sieve 
estimators  can  be  developed  along  the  lines  of  Murphy  and  Sen  (1991),  who  studied 
the  fully  time- dependent  case.  We  shall  not  pursue  this  here,  but  we  note  that  the 
asymptotic  theory  is  considerably  more  compHcated  to  develop  than  with  the  partly 
parametric  additive  risk  model. 

6  Asymptotic  distributions 


In  this  section  we  find  the  asymptotic  distribution  of  our  estimators  $  and  A. 
This  is  done  under  conditions  stated  in  McKeague  (1988a)  or  Huffer  and  McKeague 
(1991);  in  particular,  the  covariates  are  assumed  to  be  bounded  and  A(-|x,2)  is 
assumed  to  be  bounded  away  from  zero.  The  follow-up  period  is  taken  to  be  a  fixed 
bounded  interval. 

We  begin  by  noting  that 


J  Z'H  dM  =  J  Z'H  dN-  J  Z'HX  dA-  J  Z'HZ  dt  0 
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(5.1) 


Z'HdN-  j  Z'Szdtff 

since  H  is  orthogonal  to  X.  Hence 

(n-^  J  Z'HZdty\-'^^^  J  Z'HdM, 

provided  the  inverse  matrix  exists.  This  will  allow  us  to  obtain  the  asymptotic 
distribution  of  $  for  any  predictable  W  that  is  a  uniformly  consistent  estimate  of 
W,  via  the  martingale  central  limit  theorem.  For  now  suppose  that  the  weights  are 
computed  via  method  I,  in  which  case  W  is  predictable.  The  method  II  weights 
axe  not  predictable  because  they  are  a  function  of  the  initial  estimate  of  /3,  both 
explicitly  and  through  dependence  on  the  initial  estimate  of  a,  so  some  additioned 
work  will  be  needed  in  that  case. 

Let  Y  =  (X,  Z).  As  a  consequence  of  the  independent  and  identically  dis¬ 
tributed  replicates,  n~^Y'WY  converges  in  probability  to  a  nonrandom  matrix 
function  uniformly  over  bounded  time  intervals.  This  function  is  assumed  to  be 
nonsingular  and  smooth.  We  apply  the  martingale  central  limit  theorem  to  the 
martingale  Jo  Z'H  dM,  which  has  predictable  variation 

Z'HW-'^H'Zds. 

Routine  matrix  algebra  gives  HW~^H’  =  H.  Also,  n~^ Z' H{W~^  —  W~^)HZ 
converges  uniformly  in  probability  to  zero,  cf.  McKeague  (1988b,  Lemma  4.3).  Let 
E  denote  the  limit  in  probability  of  f  Z'HZdt.  By  uniform  consistency  of 
H  and  boundedness  of  the  covariates,  the  matrix  n~^  f  Z'HZ  dt  also  converges  in 
probability  to  S.  It  follows  from  (5.1)  that  —  P)  converges  in  distribution  to 

a  mean  zero  multivariate  normal  with  variance 
From  (2.2)  and  the  definition  of  A, 

n^/2(i- A)  =  /  {X'WXy'^X'WdM 

_  _  (5.2) 

{X'WX)-'^X'WZdtn^^^0  - 13), 

provided  the  inverse  matrix  exists  at  all  <;  if  not,  an  additional  term  of  order  op(l) 
is  required.  Once  again  we  can  apply  the  martingale  central  limit  theorem.  The 
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covariation  between  ^X'W dM  and  n  JJ,  ZH  dM  is 

(X'WX)-^  X'WW-^  H'  Z  dt, 

which  converges  in  probability  to  a  matrix  of  zeros,  by  the  uniform  consistency  of 
W  and  the  orthogonality  of  H  and  X.  This  implies  that  the  two  terms  on  the  right 
side  of  (5.2)  are  asymptotically  independent.  Let  V  denote  the  limit  in  probability 
of  n~^X'WX.  As  in  Buffer  and  McKeague  (1991),  the  first  term  in  (5.2)  converges 
in  distribution  to  a  Gaussian  martingale  m  with  covariation  process  V~^  dt.  This 
is  simply  the  limit  of  —  A)  in  the  usual  additive  risk  model  in  which  ^  =  0. 

It  follows  that  n^^^(A  —  A)  converges  in  distribution  to  m  +  where  m  and 

^  are  independent,  (  is  mean  zero  multivariate  normal  with  variance  S~^,  and 
V>(t)  =  fg  V~^U  ds  where  U  is  the  limit  in  probability  of  n~^ X'WZ . 

It  remains  to  show  that  the  above  argument  cam  be  modified  to  allow  for 
the  non-predictability  of  W  when  the  weights  are  computed  via  method  II.  First 
consider  the  last  part  of  (5.1),  f  Z'H  dM,  each  component  of  which  can  be 

written  in  the  form 

Gi0)dMi  (5.3) 

where  Gi(0)  =  Gi(0,t)  is  predictable,  twice  differentiable  in  0,  and  0  is  the  initial 
estimate  of  0.  Taylor  expanding  about  the  true  0,  we  can  express  (5.3)  as 

jGiWdMi  +  n'/\0-l3)'n-'Y,  jGiWm 

+  -  ^)}'|n-’  Y,  j Ci(r)  -  0)] 

where  0*  lies  on  the  line  segment  between  0  and  0,  and  the  dependence  of  0*  on 
t  Eind  i  has  been  suppressed.  The  first  term,  having  predictable  integrands,  can  be 
treated  using  the  martingale  central  limit  theorem  as  before.  The  second  term  is 
easily  shown  to  converge  to  zero  in  probability  since  0  is  n^/^-consistent  and  the  in¬ 
tegrand  is  predictable  and  uniformly  bounded.  The  third  term  is  also  asymptotically 
negligible  since  Gi{0)  is  uniformly  bovmded  for  0  belonging  to  a  neighbourhood  of 
00.  Note  that  we  have  been  using  the  boundedness  of  covariates  and  the  assumption 
that  A  is  bounded  away  from  zero.  A  similar  argument  applies  to  the  first  term  on 
the  right  side  of  (5.2).  We  conclude  that  the  asymptotic  distribution  of  0  and  .4  is 
the  same  for  methods  I  and  II. 
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Appendix:  The  efficient  score  for  p 


The  efficient  score  for  0  is  obtained  by  projecting  the  score  onto  the  orthog¬ 
onal  complement  of  the  tangent  space  spanned  by  the  range  of  the  score  operator 
la-  When  n  =  1,  it  will  be  given  by 


2  -  b*{tyx 
A(t|x,2) 


dM(t\x,  z) 


for  some  b*  such  that  is  orthogonal  to  lab  for  all  b  such  that 


E[S{b{T)'z/\{T\x,z)y]  <  oo. 


Thus  for  all  such  6, 


Q  =  e\  f  ?  dM{t)  [  dM{t) 

[J  A(tlx,2)  ^  J  A(tli,2) 


A(Tla:,2)  A(T|x,2)J  ’ 

see,  for  example,  Sasieni  (1992,  Lemma  A.l).  Hence 


see  Sasieni  (1992,  section  3). 

Thus,  for  a  sample  of  size  n, 

l}  =  J  Z'WdN  -  J {Z'WX){X'WX)-'^X'W  dN 

-  J  Z'WXdA  +  j{Z'WX){X'WX)-^X'WXdA 

-  J  Z'WZdt0  +  j{Z'WX){X'WX)-KX'WZ  dt  0 
=  j  Z'HdN-  J  Z'HZdt0. 

Solving  /J  =  0  for  0  gives  (2.3). 

We  conclude  that  the  estimators  0  and  A(-)  discussed  in  this  paper  are  asymp¬ 
totically  efficient  for  the  semiparametric  model  (Bickel  et.  al.,  1992). 
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